Classification of English language learner writing errors using a parallel corpus with SVM
نویسندگان
چکیده
In order to overcome mistakes, learners need feedback to prompt reflection on their errors. This is a particularly important issue in education systems, as the system effectiveness in finding errors or mistakes could have an impact on learning. Finding errors is essential to providing appropriate guidance in order for learners to overcome their flaws. Traditionally the task of finding errors in writing takes time and effort. The authors of this paper have a long-term research goal of creating tools for learners, especially autonomous learners, to enable them to be more aware of their errors and provide a way to reflect on the errors. As a part of this research, we propose the use of a classifier to automatically analyse and determine the errors in foreign language writing. For the experiment in this paper we collected random sentences from the Lang-8 website that had been written by foreign language learners. Using predefined error categories, we manually classified the sentences to use as machine learning training data. This was then used to train a classifier by applying SVM machine learning to the training data. As the manual classification of training data takes time, it is intended that the classifier would be used to accelerate the process used for generating further training data.
منابع مشابه
Hedges in English for Academic Purposes: A Corpus-based study of Iranian EFL learners
Hedges, as tools to express tentativeness and doubt, have been studied in plenty of research papers in the Iranian EFL research setting. However, their use in a learner corpus, portraying Iranian learner English, is in need of more research attention. With this end in view, this study aimed at investigating how Iranian EFL learners who have majored in English-related fields in Iran deployed hed...
متن کاملHow textbooks (and learners) get it wrong: A corpus study of modal auxiliary verbs
Many elements contribute to the relative difficulty in acquiring specific aspects of English as a foreign language (Goldschneider & DeKeyser, 2001). Modal auxiliary verbs (e.g. could, might), are examples of a structure that is difficult for many learners. Not only are they particularly complex semantically, but especially in the Malaysian context ...
متن کاملMetadiscourse Markers in a Corpus of Learner Language: The Case of Iranian EFL Learners
Different issues have been probed in learner corpus research since the late 1980s.However, taking the im- portance of meta discourse markers (MDMs) in signposting academic discourse, their use in Iranian EFL learners‟ academic essays is an area of research in need of a more serious analysis. Contributing to this line of investigation, this paper reports a corpus-based study of the use of MDMs i...
متن کاملError Analysis of Taiwanese University Students’ English Essay Writing: A Longitudinal Corpus Study
Writing is considered one of the most difficult skills in EFL/ESL. Thus, meticulous recognition and classification of students’ errors in certain contexts is a worthwhile endeavor which provides us with both diagnostic and prognostic power. Accordingly, a total of 430 students in 15 English writing classes held during 12 consecutive semesters in a private university in central Taiwan were the s...
متن کاملLearning with Learner Corpora: using the TLE for Native Language Identification
This study investigates the usefulness of the Treebank of Learner English (TLE) when applied to the task of Native Language Identification (NLI). The TLE is effectively a parallel corpus of Standard/Learner English, as there are two versions; one based on original learner essays, and the other an error-corrected version. We use the corpus to explore how useful a parser trained on ungrammatical ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- I. J. Knowledge and Web Intelligence
دوره 5 شماره
صفحات -
تاریخ انتشار 2014